{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "3bJlQu6xfEIi"
   },
   "outputs": [],
   "source": [
    "# Copyright 2020 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9KiDzyYNfEIo"
   },
   "source": [
    "<table align=\"left\">\n",
    "    <td>\n",
    "        <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/ai-platform-samples/blob/master/notebooks/samples/optimizer/ai_platform_optimizer_conditional_parameters.ipynb\">\n",
    "            <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Colab logo\"> Run in Colab\n",
    "        </a>\n",
    "    </td>\n",
    "    <td>\n",
    "        <a href=\"https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/notebooks/samples/optimizer/ai_platform_optimizer_conditional_parameters.ipynb\">\n",
    "            <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\">View on GitHub\n",
    "        </a>\n",
    "     </td>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "VQ3-QAet1Ufd"
   },
   "source": [
    "# Optimizing a Machine Learning model\n",
    "\n",
    "This tutorial demonstrates AI Platform Optimizer conditional objective optimization.\n",
    "\n",
    "### Dataset\n",
    "\n",
    "The Census Income Data Set that this sample uses for training is hosted by the UC Irvine Machine Learning Repository.\n",
    "\n",
    "Using census data which contains a person's age, education, marital status, and occupation (the features), the goal will be to predict whether or not the person earns more than 50,000 dollars a year (the target label). You will train a logistic regression model that, given an individual's information, outputs a number between 0 and 1. This number can be interpreted as the probability that the individual has an annual income of over 50,000 dollars.\n",
    "\n",
    "### Objective\n",
    "\n",
    "This tutorial demonstrates how to use AI Platform Optimizer to optimize the hyperparameter search for machine learning models.\n",
    "\n",
    "This sample implements an automatic learning demo that optimizes a classification model on a census dataset by using AI Platform Optimizer with AI Platform Training built-in algorithms. You will use AI Platform Optimizer to get suggested hyperparameter values and submit model training jobs with those suggested hyperparameter values through AI Platform Training built-in algorithms.\n",
    "\n",
    "### Costs\n",
    "\n",
    "This tutorial uses billable components of Google Cloud:\n",
    "\n",
    "* AI Platform Training\n",
    "* Cloud Storage\n",
    "\n",
    "Learn about [AI Platform Training\n",
    "pricing](https://cloud.google.com/ai-platform/training/pricing) and [Cloud Storage\n",
    "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n",
    "Calculator](https://cloud.google.com/products/calculator/)\n",
    "to generate a cost estimate based on your projected usage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KUopr8jLfEIq"
   },
   "source": [
    "### PIP install packages and dependencies\n",
    "\n",
    "Install additional dependencies not installed in the notebook environment.\n",
    "\n",
    "- Use the latest major GA version of the framework."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "IMVB9pudfEIr"
   },
   "outputs": [],
   "source": [
    "! pip install -U google-cloud\n",
    "! pip install -U google-cloud-storage\n",
    "! pip install -U requests  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "no_execute"
    ]
   },
   "outputs": [],
   "source": [
    "# Automatically restart kernel after installs\n",
    "import IPython\n",
    "app = IPython.Application.instance()\n",
    "app.kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_jIGPMHofEIu"
   },
   "source": [
    "### Set up your Google Cloud project\n",
    "\n",
    "**The following steps are required, regardless of your notebook environment.**\n",
    "\n",
    "1. [Select or create a Google Cloud project.](https://console.cloud.google.com/cloud-resource-manager)\n",
    "\n",
    "2. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)\n",
    "\n",
    "3. [Enable the AI Platform APIs.](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com)\n",
    "\n",
    "4. If you are running this notebook locally, you will need to install [Google Cloud SDK](https://cloud.google.com/sdk).\n",
    "\n",
    "5. Enter your project ID in the cell below. Then run the  cell to make sure the\n",
    "Cloud SDK uses the right project for all the commands in this notebook.\n",
    "\n",
    "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "HThAQCHMfEIv"
   },
   "outputs": [],
   "source": [
    "PROJECT_ID = '[your-project-id]' #@param {type:\"string\"}\n",
    "! gcloud config set project $PROJECT_ID"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UXTRIMMOfEIy"
   },
   "source": [
    "### Authenticate your Google Cloud account\n",
    "\n",
    "**If you are using [AI Platform Notebooks](https://cloud.google.com/ai-platform/notebooks/docs/)**, your environment is already\n",
    "authenticated. Skip these steps."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "H-HHmdH6qAz8"
   },
   "source": [
    "The cell below will require you to authenticate yourself **twice**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "4qH3-CQAfEIz",
    "tags": [
     "no_execute"
    ]
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "\n",
    "# If you are running this notebook in Colab, run this cell and follow the\n",
    "# instructions to authenticate your Google Cloud account. This provides access\n",
    "# to your Cloud Storage bucket and lets you submit training jobs and prediction\n",
    "# requests.\n",
    "\n",
    "if 'google.colab' in sys.modules:\n",
    "    from google.colab import auth as google_auth\n",
    "    google_auth.authenticate_user()\n",
    "\n",
    "# If you are running this tutorial in a notebook locally, replace the string\n",
    "# below with the path to your service account key and run this cell to\n",
    "# authenticate your Google Cloud account.\n",
    "else:\n",
    "    %env GOOGLE_APPLICATION_CREDENTIALS your_path_to_credentials.json\n",
    "\n",
    "# Log in to your account on Google Cloud\n",
    "! gcloud auth application-default login\n",
    "! gcloud auth login"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "i7E4L93EfEI3"
   },
   "source": [
    "### Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "z4auQSQDfEI3"
   },
   "outputs": [],
   "source": [
    "import json\n",
    "import time\n",
    "import datetime\n",
    "from googleapiclient import errors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "s-b-ycEDPLN8"
   },
   "source": [
    "## Tutorial\n",
    "\n",
    "### Setup\n",
    "\n",
    "This section defines some parameters and util methods to call AI Platform Optimizer APIs. Please fill in the following information to get started."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "LLjuu6hrO_Vw"
   },
   "outputs": [],
   "source": [
    "# Update to your username\n",
    "USER = '[your-user-id]' #@param {type: 'string'}\n",
    "\n",
    "STUDY_ID = '{}_study_{}'.format(USER.replace('-',''), datetime.datetime.now().strftime('%Y%m%d_%H%M%S')) #@param {type: 'string'}\n",
    "REGION = 'us-central1'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "QSIltnu_emGh"
   },
   "outputs": [],
   "source": [
    "def study_parent():\n",
    "  return 'projects/{}/locations/{}'.format(PROJECT_ID, REGION)\n",
    "\n",
    "\n",
    "def study_name(study_id):\n",
    "  return 'projects/{}/locations/{}/studies/{}'.format(PROJECT_ID, REGION, study_id)\n",
    "\n",
    "\n",
    "def trial_parent(study_id):\n",
    "  return study_name(study_id)\n",
    "\n",
    "\n",
    "def trial_name(study_id, trial_id):\n",
    "  return 'projects/{}/locations/{}/studies/{}/trials/{}'.format(PROJECT_ID, REGION, \n",
    "                                                                study_id, trial_id)\n",
    "\n",
    "def operation_name(operation_id):\n",
    "  return 'projects/{}/locations/{}/operations/{}'.format(PROJECT_ID, REGION, operation_id)\n",
    "\n",
    "\n",
    "print('USER: {}'.format(USER))\n",
    "print('PROJECT_ID: {}'.format(PROJECT_ID))\n",
    "print('REGION: {}'.format(REGION))\n",
    "print('STUDY_ID: {}'.format(STUDY_ID))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "gZ_waPW5Aa7w"
   },
   "source": [
    "### Build the API client\n",
    "\n",
    "The following cell builds the auto-generated API client using [Google API discovery service](https://developers.google.com/discovery). The JSON format API schema is hosted in a Cloud Storage bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "4Nz6fd5eAer8"
   },
   "outputs": [],
   "source": [
    "from google.cloud import storage\n",
    "from googleapiclient import discovery\n",
    "\n",
    "\n",
    "_OPTIMIZER_API_DOCUMENT_BUCKET = 'caip-optimizer-public'\n",
    "_OPTIMIZER_API_DOCUMENT_FILE = 'api/ml_public_google_rest_v1.json'\n",
    "\n",
    "\n",
    "def read_api_document():\n",
    "  client = storage.Client(PROJECT_ID)\n",
    "  bucket = client.get_bucket(_OPTIMIZER_API_DOCUMENT_BUCKET)\n",
    "  blob = bucket.get_blob(_OPTIMIZER_API_DOCUMENT_FILE)\n",
    "  return blob.download_as_string()\n",
    "\n",
    "\n",
    "ml = discovery.build_from_document(service=read_api_document())\n",
    "print('Successfully built the client.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "5K4ztIlcv9ZC"
   },
   "source": [
    "### Study configuration\n",
    "\n",
    "In this tutorial, AI Platform Optimizer creates a study and requests trials. For each trial, you will create an AI Platform Training built-in algorithm job to do model training using the suggested hyperparameters. A measurement for each trial is reported as model `accuracy`.\n",
    "\n",
    "The __conditional parameter__ features provided by AI Platform Optimizer define a tree-like search space for hyperparameters. The top-level hyperparameter is __`model_type`__ which is decided between __`LINEAR`__ and __`WIDE_AND_DEEP`__. Each model type has corresponding second level hyperparameters to tune:\n",
    "\n",
    "  - If `model_type` is `LINEAR`, `learning_rate` is tuned.\n",
    "  - If `model_type` is `WIDE_AND_DEEP`, both `learning_rate` and `dnn_learning_rate` are tuned.\n",
    "\n",
    "![A decision tree where model_type is either LINEAR or WIDE_AND_DEEP; LINEAR points to learning_rate and WIDE_AND_DEEP points to both learning_rate and dnn_learning_rate](https://cloud.google.com/ai-platform/images/optimizing-ml-model.png)\n",
    "\n",
    "The following is a sample study configuration, built as a hierarchical python dictionary. It is already filled out. Run the cell to configure the study."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "s-AHfPOASXXW"
   },
   "outputs": [],
   "source": [
    "param_learning_rate = {\n",
    "    'parameter': 'learning_rate',\n",
    "    'type' : 'DOUBLE',\n",
    "    'double_value_spec' : {\n",
    "        'min_value' : 0.00001,\n",
    "        'max_value' : 1.0\n",
    "    },\n",
    "    'scale_type' : 'UNIT_LOG_SCALE',\n",
    "    'parent_categorical_values' : {\n",
    "        'values': ['LINEAR', 'WIDE_AND_DEEP']\n",
    "    },\n",
    "}\n",
    "\n",
    "param_dnn_learning_rate = {\n",
    "    'parameter': 'dnn_learning_rate',\n",
    "    'type' : 'DOUBLE',\n",
    "    'double_value_spec' : {\n",
    "        'min_value' : 0.00001,\n",
    "        'max_value' : 1.0\n",
    "    },\n",
    "    'scale_type' : 'UNIT_LOG_SCALE',\n",
    "    'parent_categorical_values' : {\n",
    "        'values': ['WIDE_AND_DEEP']\n",
    "    },\n",
    "}\n",
    "\n",
    "param_model_type = {\n",
    "    'parameter': 'model_type',\n",
    "    'type' : 'CATEGORICAL',\n",
    "    'categorical_value_spec' : {'values': ['LINEAR', 'WIDE_AND_DEEP']},\n",
    "    'child_parameter_specs' : [param_learning_rate, param_dnn_learning_rate,]\n",
    "}\n",
    "\n",
    "metric_accuracy = {\n",
    "    'metric' : 'accuracy',\n",
    "    'goal' : 'MAXIMIZE'\n",
    "}\n",
    "\n",
    "study_config = {\n",
    "    'algorithm' : 'ALGORITHM_UNSPECIFIED',  # Let the service choose the `default` algorithm.\n",
    "    'parameters' : [param_model_type,], \n",
    "    'metrics' : [metric_accuracy,],\n",
    "}\n",
    "\n",
    "study = {'study_config': study_config}\n",
    "print(json.dumps(study, indent=2, sort_keys=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "p3Q2HkJWwIv0"
   },
   "source": [
    "### Create the study\n",
    "\n",
    "Next, create the study, which you will subsequently run to optimize the objective."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "jgskzqZX0Mkt"
   },
   "outputs": [],
   "source": [
    "# Creates a study\n",
    "req = ml.projects().locations().studies().create(\n",
    "    parent=study_parent(), studyId=STUDY_ID, body=study)\n",
    "try :\n",
    "  print(req.execute())\n",
    "except errors.HttpError as e: \n",
    "  if e.resp.status == 409:\n",
    "    print('Study already existed.')\n",
    "  else:\n",
    "    raise e"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "r4GK3fWTfEJH"
   },
   "source": [
    "### Set input/output parameters\n",
    "\n",
    "Next, set the following output parameters.\n",
    "\n",
    "`BUCKET_NAME` and `OUTPUT_DIR` are the Cloud Storage bucket and directory used as 'job_dir' for AI Platform Training jobs. `BUCKET_NAME` should be a bucket in your project and `OUTPUT_DIR` is the name you want to give to the output folder in your bucket.\n",
    "\n",
    "`job_dir` will be in the format of 'gs://\\$BUCKET_NAME/\\$OUTPUT_DIR/'\n",
    "\n",
    "`TRAINING_DATA_PATH` is the path for the input training dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "nw8HNR0-27vi"
   },
   "outputs": [],
   "source": [
    "# `job_dir` will be `gs://${BUCKET_NAME}/${OUTPUT_DIR}/${job_id}`\n",
    "BUCKET_NAME = '[your-bucket-name]' #@param {type: 'string'}\n",
    "OUTPUT_DIR = '[output-dir]' #@param {type: 'string'}\n",
    "TRAINING_DATA_PATH = 'gs://caip-optimizer-public/sample-data/raw_census_train.csv' #@param {type: 'string'}\n",
    "\n",
    "print('BUCKET_NAME: {}'.format(BUCKET_NAME))\n",
    "print('OUTPUT_DIR: {}'.format(OUTPUT_DIR))\n",
    "print('TRAINING_DATA_PATH: {}'.format(TRAINING_DATA_PATH))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "KY4gCIEnAHRo"
   },
   "outputs": [],
   "source": [
    "# Create the bucket in Cloud Storage\n",
    "! gsutil mb -p $PROJECT_ID gs://$BUCKET_NAME/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Ja8OlVzMwN5V"
   },
   "source": [
    "### Metric evaluation\n",
    "\n",
    "This section defines the methods to do trial evaluation. \n",
    "\n",
    "For each trial, submit an AI Platform built-in algorithm job to train a machine learning model using hyperparameters suggested by AI Platform Optimizer. Each job writes the model summary file into Cloud Storage when the job completes. You can retrieve the model accuracy from the job directory and report it as the `final_measurement` of the trial.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Xnl1uqnyz3Qp"
   },
   "outputs": [],
   "source": [
    "import logging\n",
    "import math\n",
    "import subprocess\n",
    "import os\n",
    "import yaml\n",
    "\n",
    "from google.cloud import storage\n",
    "\n",
    "_TRAINING_JOB_NAME_PATTERN = '{}_condition_parameters_{}_{}'\n",
    "_IMAGE_URIS = {'LINEAR' : 'gcr.io/cloud-ml-algos/linear_learner_cpu:latest', \n",
    "               'WIDE_AND_DEEP' : 'gcr.io/cloud-ml-algos/wide_deep_learner_cpu:latest'}\n",
    "_STEP_COUNT = 'step_count'\n",
    "_ACCURACY = 'accuracy'\n",
    "\n",
    "\n",
    "def EvaluateTrials(trials):\n",
    "  \"\"\"Evaluates trials by submitting training jobs to AI Platform Training service.\n",
    "     \n",
    "     Args:\n",
    "       trials: List of Trials to evaluate\n",
    "\n",
    "     Returns: A dict of <trial_id, measurement> for the given trials.\n",
    "  \"\"\"\n",
    "  trials_by_job_id = {}\n",
    "  mesurement_by_trial_id = {}\n",
    "\n",
    "  # Submits a AI Platform Training job for each trial.\n",
    "  for trial in trials:\n",
    "    trial_id = int(trial['name'].split('/')[-1])\n",
    "    model_type = _GetSuggestedParameterValue(trial, 'model_type', 'stringValue')\n",
    "    learning_rate = _GetSuggestedParameterValue(trial, 'learning_rate', \n",
    "                                                'floatValue')\n",
    "    dnn_learning_rate = _GetSuggestedParameterValue(trial, 'dnn_learning_rate', \n",
    "                                                    'floatValue')\n",
    "    job_id = _GenerateTrainingJobId(model_type=model_type, \n",
    "                                    trial_id=trial_id)\n",
    "    trials_by_job_id[job_id] = {\n",
    "        'trial_id' : trial_id,\n",
    "        'model_type' : model_type,\n",
    "        'learning_rate' : learning_rate,\n",
    "        'dnn_learning_rate' : dnn_learning_rate, \n",
    "    }\n",
    "    _SubmitTrainingJob(job_id, trial_id, model_type, learning_rate, dnn_learning_rate)\n",
    "\n",
    "  # Waits for completion of AI Platform Training jobs.\n",
    "  while not _JobsCompleted(trials_by_job_id.keys()):\n",
    "    time.sleep(60)\n",
    "\n",
    "  # Retrieves model training result(e.g. global_steps, accuracy) for AI Platform Training jobs.\n",
    "  metrics_by_job_id = _GetJobMetrics(trials_by_job_id.keys())\n",
    "  for job_id, metric in metrics_by_job_id.items():\n",
    "    measurement = _CreateMeasurement(trials_by_job_id[job_id]['trial_id'],\n",
    "                                     trials_by_job_id[job_id]['model_type'],\n",
    "                                     trials_by_job_id[job_id]['learning_rate'], \n",
    "                                     trials_by_job_id[job_id]['dnn_learning_rate'],\n",
    "                                     metric)\n",
    "    mesurement_by_trial_id[trials_by_job_id[job_id]['trial_id']] = measurement\n",
    "  return mesurement_by_trial_id\n",
    "\n",
    "\n",
    "def _CreateMeasurement(trial_id, model_type, learning_rate, dnn_learning_rate, metric):\n",
    "  if not metric[_ACCURACY]:\n",
    "    # Returns `none` for trials without metrics. The trial will be marked as `INFEASIBLE`.\n",
    "    return None\n",
    "  print(\n",
    "      'Trial {0}: [model_type = {1}, learning_rate = {2}, dnn_learning_rate = {3}] => accuracy = {4}'.format(\n",
    "          trial_id, model_type, learning_rate, \n",
    "          dnn_learning_rate if dnn_learning_rate else 'N/A', metric[_ACCURACY]))\n",
    "  measurement = {\n",
    "      _STEP_COUNT: metric[_STEP_COUNT], \n",
    "      'metrics': [{'metric': _ACCURACY, 'value': metric[_ACCURACY]},]}\n",
    "  return measurement\n",
    "\n",
    "\n",
    "def _SubmitTrainingJob(job_id, trial_id, model_type, learning_rate, dnn_learning_rate=None):\n",
    "  \"\"\"Submits a built-in algo training job to AI Platform Training Service.\"\"\"\n",
    "  try:\n",
    "    if model_type == 'LINEAR':\n",
    "      subprocess.check_output(_LinearCommand(job_id, learning_rate), stderr=subprocess.STDOUT)\n",
    "    elif model_type == 'WIDE_AND_DEEP':\n",
    "      subprocess.check_output(_WideAndDeepCommand(job_id, learning_rate, dnn_learning_rate), stderr=subprocess.STDOUT)\n",
    "    print('Trial {0}: Submitted job [https://console.cloud.google.com/ai-platform/jobs/{1}?project={2}].'.format(trial_id, job_id, PROJECT_ID))\n",
    "  except subprocess.CalledProcessError as e:\n",
    "    logging.error(e.output)\n",
    "\n",
    "\n",
    "def _GetTrainingJobState(job_id):\n",
    "  \"\"\"Gets a training job state.\"\"\"\n",
    "  cmd = ['gcloud', 'ai-platform', 'jobs', 'describe', job_id, \n",
    "         '--project', PROJECT_ID, \n",
    "         '--format', 'json']\n",
    "  try:\n",
    "    output = subprocess.check_output(cmd, stderr=subprocess.STDOUT, timeout=3)\n",
    "  except subprocess.CalledProcessError as e:\n",
    "    logging.error(e.output)\n",
    "  return json.loads(output)['state']\n",
    "\n",
    "\n",
    "def _JobsCompleted(jobs):\n",
    "  \"\"\"Checks if all the jobs are completed.\"\"\"\n",
    "  all_done = True\n",
    "  for job in jobs:\n",
    "    if _GetTrainingJobState(job) not in ['SUCCEEDED', 'FAILED', 'CANCELLED']:\n",
    "      print('Waiting for job[https://console.cloud.google.com/ai-platform/jobs/{0}?project={1}] to finish...'.format(job, PROJECT_ID))\n",
    "      all_done = False\n",
    "  return all_done\n",
    "\n",
    "\n",
    "def _RetrieveAccuracy(job_id):\n",
    "  \"\"\"Retrices the accuracy of the trained model for a built-in algorithm job.\"\"\"\n",
    "  storage_client = storage.Client(project=PROJECT_ID)\n",
    "  bucket = storage_client.get_bucket(BUCKET_NAME)\n",
    "  blob_name = os.path.join(OUTPUT_DIR, job_id, 'model/deployment_config.yaml')\n",
    "  blob = storage.Blob(blob_name, bucket)\n",
    "  try: \n",
    "    blob.reload()\n",
    "    content = blob.download_as_string()\n",
    "    accuracy = float(yaml.safe_load(content)['labels']['accuracy']) / 100\n",
    "    step_count = int(yaml.safe_load(content)['labels']['global_step'])\n",
    "    return {_STEP_COUNT: step_count, _ACCURACY: accuracy}\n",
    "  except:\n",
    "    # Returns None if failed to load the built-in algo output file.\n",
    "    # It could be due to job failure and the trial will be `INFEASIBLE`\n",
    "    return None\n",
    "\n",
    "\n",
    "def _GetJobMetrics(jobs):\n",
    "  accuracies_by_job_id = {}\n",
    "  for job in jobs:\n",
    "    accuracies_by_job_id[job] = _RetrieveAccuracy(job)\n",
    "  return accuracies_by_job_id\n",
    "\n",
    "\n",
    "def _GetSuggestedParameterValue(trial, parameter, value_type):\n",
    "  param_found = [p for p in trial['parameters'] if p['parameter'] == parameter]\n",
    "  if param_found:\n",
    "    return param_found[0][value_type]\n",
    "  else: \n",
    "    return None\n",
    "\n",
    "\n",
    "def _GenerateTrainingJobId(model_type, trial_id):\n",
    "  return _TRAINING_JOB_NAME_PATTERN.format(STUDY_ID, model_type, trial_id)\n",
    "\n",
    "\n",
    "def _GetJobDir(job_id):\n",
    "  return os.path.join('gs://', BUCKET_NAME, OUTPUT_DIR, job_id)\n",
    "\n",
    "\n",
    "def _LinearCommand(job_id, learning_rate):\n",
    "  return ['gcloud', 'ai-platform', 'jobs', 'submit', 'training', job_id,\n",
    "          '--scale-tier', 'BASIC',\n",
    "          '--region', 'us-central1',\n",
    "          '--master-image-uri', _IMAGE_URIS['LINEAR'],\n",
    "          '--project', PROJECT_ID,\n",
    "          '--job-dir', _GetJobDir(job_id),\n",
    "          '--',\n",
    "          '--preprocess',\n",
    "          '--model_type=classification',\n",
    "          '--batch_size=250',\n",
    "          '--max_steps=1000',\n",
    "          '--learning_rate={}'.format(learning_rate),\n",
    "          '--training_data_path={}'.format(TRAINING_DATA_PATH)]\n",
    " \n",
    " \n",
    "def _WideAndDeepCommand(job_id, learning_rate, dnn_learning_rate):\n",
    "  return ['gcloud', 'ai-platform', 'jobs', 'submit', 'training', job_id,\n",
    "          '--scale-tier', 'BASIC',\n",
    "          '--region', 'us-central1',\n",
    "          '--master-image-uri', _IMAGE_URIS['WIDE_AND_DEEP'],\n",
    "          '--project', PROJECT_ID,\n",
    "          '--job-dir', _GetJobDir(job_id),\n",
    "          '--',\n",
    "          '--preprocess',\n",
    "          '--test_split=0',\n",
    "          '--use_wide',\n",
    "          '--embed_categories',\n",
    "          '--model_type=classification',\n",
    "          '--batch_size=250',\n",
    "          '--learning_rate={}'.format(learning_rate),\n",
    "          '--dnn_learning_rate={}'.format(dnn_learning_rate),\n",
    "          '--max_steps=1000',\n",
    "          '--training_data_path={}'.format(TRAINING_DATA_PATH)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "h09cPYkaJr3X"
   },
   "source": [
    "### Configuration for requesting suggestions/trials\n",
    "\n",
    "__`client_id`__ - The identifier of the client that is requesting the suggestion. If multiple SuggestTrialsRequests have the same `client_id`, the service will return the identical suggested trial if the trial is `PENDING`, and provide a new trial if the last suggested trial was completed.\n",
    "\n",
    "__`suggestion_count_per_request`__ - The number of suggestions (trials) requested in a single request.\n",
    "\n",
    "__`max_trial_id_to_stop`__ - The number of trials to explore before stopping. It is set to 4 to shorten the time to run the code, so don't expect convergence. For convergence, it would likely need to be about 20 (a good rule of thumb is to multiply the total dimensionality by 10).\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {},
    "colab_type": "code",
    "id": "zEXsB9ObJtd9"
   },
   "outputs": [],
   "source": [
    "client_id = 'client1' #@param {type: 'string'}\n",
    "suggestion_count_per_request =   2 #@param {type: 'integer'}\n",
    "max_trial_id_to_stop =   4 #@param {type: 'integer'}\n",
    "\n",
    "print('client_id: {}'.format(client_id))\n",
    "print('suggestion_count_per_request: {}'.format(suggestion_count_per_request))\n",
    "print('max_trial_id_to_stop: {}'.format(max_trial_id_to_stop))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "6M5zZviGqwka"
   },
   "source": [
    "### Request and run AI Platform Optimizer trials\n",
    "\n",
    "Run the trials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "opmuTntW4-eS"
   },
   "outputs": [],
   "source": [
    "current_trial_id = 0\n",
    "while current_trial_id < max_trial_id_to_stop:\n",
    "  # Request trials\n",
    "  resp = ml.projects().locations().studies().trials().suggest(\n",
    "    parent=trial_parent(STUDY_ID), \n",
    "    body={'client_id': client_id, 'suggestion_count': suggestion_count_per_request}).execute()\n",
    "  op_id = resp['name'].split('/')[-1]\n",
    "\n",
    "  # Polls the suggestion long-running operations.\n",
    "  get_op = ml.projects().locations().operations().get(name=operation_name(op_id))\n",
    "  while True:\n",
    "      operation = get_op.execute()\n",
    "      if 'done' in operation and operation['done']:\n",
    "        break\n",
    "      time.sleep(1)\n",
    "  \n",
    "  # Featches the suggested trials.\n",
    "  trials = []\n",
    "  for suggested_trial in get_op.execute()['response']['trials']:\n",
    "    trial_id = int(suggested_trial['name'].split('/')[-1])\n",
    "    trial = ml.projects().locations().studies().trials().get(name=trial_name(STUDY_ID, trial_id)).execute()\n",
    "    if trial['state'] not in ['COMPLETED', 'INFEASIBLE']:\n",
    "      print(\"Trial {}: {}\".format(trial_id, trial))\n",
    "      trials.append(trial)\n",
    "    \n",
    "  # Evaluates trials - Submit model training jobs using AI Platform Training built-in algorithms.\n",
    "  measurement_by_trial_id = EvaluateTrials(trials)\n",
    "\n",
    "  # Completes trials.\n",
    "  for trial in trials:\n",
    "    trial_id = int(trial['name'].split('/')[-1])\n",
    "    current_trial_id = trial_id\n",
    "    measurement = measurement_by_trial_id[trial_id]\n",
    "    print((\"=========== Complete Trial: [{0}] =============\").format(trial_id))\n",
    "    if measurement:\n",
    "      # Completes trial by reporting final measurement.\n",
    "      ml.projects().locations().studies().trials().complete(\n",
    "        name=trial_name(STUDY_ID, trial_id), \n",
    "        body={'final_measurement' : measurement}).execute()\n",
    "    else:\n",
    "      # Marks trial as `infeasible` when missing final measurement.\n",
    "      ml.projects().locations().studies().trials().complete(\n",
    "        name=trial_name(STUDY_ID, trial_id), \n",
    "        body={'trial_infeasible' : True}).execute()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "mJINN_7oVn64"
   },
   "source": [
    "### [OPTIONAL] Create trials using your own parameters\n",
    "\n",
    "Besides requesting suggestions (the `suggest` method) of parameters from the service, AI Platform Optimizer's API also lets users create trials (the `create` method) using their own parameters. AI Platform Optimizer will help bookkeep the experiments done by users and take the knowledge to generate new suggestions.\n",
    "\n",
    "For example, if you run a model training job using your own `model_type` and `learning_rate` instead of the ones suggested by AI Platform Optimizer, you can create a trial for it as part of the study. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "dzT_aRv2Ug_A"
   },
   "outputs": [],
   "source": [
    "# User has to leave `trial.name` unset in CreateTrial request, the service will \n",
    "# assign it. \n",
    "custom_trial = {\n",
    "  \"clientId\": \"client1\",\n",
    "  \"finalMeasurement\": {\n",
    "    \"metrics\": [\n",
    "      {\n",
    "        \"metric\": \"accuracy\",\n",
    "        \"value\": 0.86\n",
    "      }\n",
    "    ],\n",
    "    \"stepCount\": \"1000\"\n",
    "  },\n",
    "  \"parameters\": [\n",
    "    {\n",
    "      \"parameter\": \"model_type\",\n",
    "      \"stringValue\": \"LINEAR\"\n",
    "    },\n",
    "    {\n",
    "      \"floatValue\": 0.3869103706121445,\n",
    "      \"parameter\": \"learning_rate\"\n",
    "    }\n",
    "  ],\n",
    "  \"state\": \"COMPLETED\"\n",
    "}\n",
    "\n",
    "trial = ml.projects().locations().studies().trials().create(\n",
    "    parent=trial_parent(STUDY_ID), body=custom_trial).execute()\n",
    "\n",
    "print(json.dumps(trial, indent=2, sort_keys=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ZShOQGgCCXIo"
   },
   "source": [
    "### List the trials\n",
    "\n",
    "List the results of each optimization trial training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "PKwnbG9QCYh5"
   },
   "outputs": [],
   "source": [
    "resp = ml.projects().locations().studies().trials().list(parent=trial_parent(STUDY_ID)).execute()\n",
    "print(json.dumps(resp, indent=2, sort_keys=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "6j54X9UUfEJZ"
   },
   "source": [
    "## Cleaning up\n",
    "\n",
    "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
    "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "ai_platform_optimizer_conditional_parameters.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
